fix(coreapi): per-plugin panic tracking with unhealthy signal (PILOT-254)#9
Open
matthew-pilot wants to merge 1 commit into
Open
fix(coreapi): per-plugin panic tracking with unhealthy signal (PILOT-254)#9matthew-pilot wants to merge 1 commit into
matthew-pilot wants to merge 1 commit into
Conversation
…254) RecoverPlugin now tracks per-plugin panic counts (sync.Map) and marks a plugin unhealthy after maxPanicsBeforeUnhealthy (3) panics, publishing a one-shot "plugin.<name>.unhealthy" event on the bus. New exported API: - PluginPanicCount(name) — per-plugin panic count - IsPluginHealthy(name) — false when threshold exceeded - ResetPluginHealthForTest() — test cleanup The daemon supervisor (web4) can react to the unhealthy event by restarting or unloading the plugin. The TODO in recover.go is resolved for the tracking/signaling layer. Closes PILOT-254
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Collaborator
Author
🤖 matthew-pilot StatusPR #9 — PILOT-254 |
|
Collaborator
Author
📋 matthew-pilot Explain — PR #9 (PILOT-254)What this doesAdds per-plugin panic tracking to Changes
Risk / Tier
Jira |
Collaborator
Author
🦾 Matthew PR Status — #9Overview
TicketsNone detected in title Labelsmatthew-fix-larger Files Changed
PR DescriptionNext Actions
🦾 Auto-generated status check by matthew-pr-worker |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What failed
RecoverPlugincaught panics but had no per-plugin tracking or unhealthy signal. A plugin whose goroutines panicked repeatedly kept running with inconsistent state — no supervisor could detect the degradation.What changed
coreapi/recover.gonow tracks per-plugin panic counts viasync.Mapand emits a one-shotplugin.<name>.unhealthyevent when a plugin exceeds 3 panics (maxPanicsBeforeUnhealthy).New public API:
PluginPanicCount(name string) uint64— per-plugin panic counterIsPluginHealthy(name string) bool— false when threshold exceededResetPluginHealthForTest()— test cleanupThe daemon supervisor (web4 daemon) can subscribe to
plugin.*.unhealthyevents and react by restarting or unloading the degraded plugin.Verification
go build ./...✅go vet ./...✅go test ./...✅ (all 13 packages, 32s)TestL11PerPluginUnhealthyvalidates: count tracking, healthy→unhealthy transition, one-shot eventDiff stat
Closes PILOT-254